50 ◾ Bioinformatics
together with their gene annotations, which can be used in the process of read alignment/
mapping to act as guides on which new genomes are assembled fast. A reference genome
of an organism is a curated sequence that is built up using the DNA information of several
normal individuals of that organism. The reference genome curation was pioneered by
the Genome Reference Consortium (GRC), which is founded in 2008 as a collaboration of
the National Center for Biotechnology Information (NCBI), the European Bioinformatics
Institute (EBI), the McDonnell Genome Institute (MGI), and the Wellcome Sanger Institute
to maintain and update the human and mouse genome reference assemblies. Now, GRC
maintains the human, mouse, zebrafish, rat, and chicken reference genomes. Reference
genomes of other organisms are curated by specialized institutions including NCBI and
many others, which manually select genome assemblies that are identified as standard or
representative sequences (RefSeq) against which data of the individuals from those organ-
isms can be compared. All eukaryotes have a single reference genome per species, but pro-
karyotes may have multiple reference genome sequences for a species. The NCBI curates
reference genomes from the assemblies categorized as RefSeq on the GenBank database. If
a eukaryotic species has no assemblies in the RefSeq, then the best GenBank assembly for
that species is selected as a representative genome. Viruses as well may have more than one
reference genomes per species. Generally, the update of a reference genome of any species is
a continuous process and a new version, usually called “Build”, may be released whenever
new information emerges. A release of a reference genome may be accompanied by gene
annotations. A well-curated reference genome, like human and other model organisms’
reference genomes, is usually released with annotation information such as gene anno-
tation and variant annotation. Reference genomes are made available at the NCBI web-
site in both FASTA file format and GenBank file format. Several annotation files may be
FIGURE 2.1 Human reference genome on the NCBI Genome page.